feat(vad): bundle optimized silero vad and deprecate the plugin#5800
feat(vad): bundle optimized silero vad and deprecate the plugin#5800chenghao-mou wants to merge 1 commit into
Conversation
| *, | ||
| model: VADModels = "silero", | ||
| min_speech_duration: float = 0.05, | ||
| min_silence_duration: float = 0.55, |
There was a problem hiding this comment.
Should we follow the same defaults as Silero? IMO we should use 0.1 now. It shouldn't have any side effect and it is closer to the truth
There was a problem hiding this comment.
I'd love to, but I wasn't sure if you are going to merge #5788 first.
|
tested locally and it works well. could you add a comparison in the pr description? |
| from .vad import VAD | ||
|
|
||
| vad_instance = VAD(model="silero") |
There was a problem hiding this comment.
🟡 Auto-created VAD for Speechmatics STT uses drastically lower min_silence_duration default (0.1s vs 0.55s)
In _resolve_vad_for_model, the auto-created VAD for Speechmatics STT models changed from SileroVAD.load() (which had min_silence_duration=0.55) to inference.VAD(model="silero") (which defaults to min_silence_duration=0.1). This 5.5× reduction means END_OF_SPEECH events fire much sooner for Speechmatics STT users who don't provide their own VAD. While the audio EOT detector (_maybe_apply_vad_silence_override in livekit-agents/livekit/agents/voice/audio_recognition.py:669) bumps this to at least 0.25s when the audio turn detector is active, users with text-based turn detectors (e.g., MultilingualModel) will experience the full 0.1s default — significantly different from the previous 0.55s behavior.
| from .vad import VAD | |
| vad_instance = VAD(model="silero") | |
| from .vad import VAD | |
| vad_instance = VAD(model="silero", min_silence_duration=0.55) |
Was this helpful? React with 👍 or 👎 to provide feedback.
- Add the compiled silero vad from livekit-local-inference; expose as inference.VAD(model="silero"). - Forkserver preload uses livekit.agents.inference._warmup as a side-effect module that calls init_vad() and init_eot() in the forkserver process so forked jobs inherit weight pages via COW. - Drop the prewarm vad pattern from examples; inline construction in AgentSession is now the recommended form. - Update tests for the new vad min_silence default (0.25s) and log message. Squashed from chenghao/feat/inline-silero-vad rebase onto feat/AGT-2520-multimodal-EOU. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0526efb to
122cd9c
Compare
| "livekit-local-inference>=0.2.5", | ||
| "livekit-protocol>=1.1.9,<2", | ||
| "livekit-blingfire~=1.1,<2", | ||
| "livekit-local-inference>=0.2.5", |
There was a problem hiding this comment.
🟡 Duplicate livekit-local-inference dependency entry in pyproject.toml
This PR adds a second "livekit-local-inference>=0.2.5" entry on line 37, while one already exists on line 34. While most Python build tools deduplicate dependencies gracefully, this is clearly unintentional and could cause confusion or subtle issues with tooling that doesn't handle duplicates.
| "livekit-local-inference>=0.2.5", |
Was this helpful? React with 👍 or 👎 to provide feedback.
Why
Silero VAD is the default endpointing implementation for voice agents, but lived behind a separate
livekit-plugins-sileroinstall step. That extra hop made the standard quickstart longer than it needed to be, and the plugin'sonnxruntime-based loader paid the full model load cost in every job process (no fork-time sharing).This PR moves Silero VAD into
livekit-agentscore, backed bylivekit-local-inference. The plugin stays installable as a deprecated shim until v2.0, and existing call sites continue to work — they transparently route to the new implementation when settings are compatible.This PR also introduces changes to follow the official silero settings, similar to #5788:
min_silence_durationfrom 0.55s to 0.1s.Code example
Before
After
No
prewarm_fnc, nosileroplugin import, noproc.userdatashuttle. Weights are loaded once in the forkserver and inherited by every job process via COW.API change
pip install livekit-agents livekit-plugins-sileropip install livekit-agents— Silero is bundledfrom livekit.plugins.silero import VADvad = VAD.load(min_silence_duration=0.4)from livekit.agents import inferencevad = inference.VAD(model="silero", min_silence_duration=0.4)silero.VAD.load()did a heavy onnxruntime session construction → expected to live behind aprewarmhookinference.VAD(model="silero")is a cheap wrapper; weights are loaded once at forkserver-preload time, inherited via COWVAD/VADStream/OnnxModel(~650 LOC)silero.VAD.load(force_cpu=False, sample_rate=16000)ran onnxruntime; with customonnx_file_path, used a user-supplied modelsilero.VAD.load(...)transparently delegates toinference.VAD(model="silero", ...)when settings are compatible; 8 kHz +onnx_file_pathstill routes to the legacy onnxruntime pathvad: NotGivenOr[vad.VAD] = NOT_GIVENinAgentSession.__init__—vad=Nonewas illegal per type, even though the code accepted itvad: NotGivenOr[vad.VAD | None] = NOT_GIVEN—vad=Nonenow type-legal as an explicit "no VAD" signalfrom livekit.agents.inference import VADBehaviour change
livekit.agents.inference._warmuponce →init_vad()+init_eot()page native weights into the forkserver. Jobs fork with weights already resident.import livekit.plugins.silero— silentDeprecationWarningpointing toinference.VAD(model="silero"); v2.0 removal targetsilero.VAD.load(force_cpu=False)honored the user's GPU request via onnxruntimeforce_cpu=Falseis ignored (native lib is CPU-only) → now emits aWARNINGexplaining the kwarg is ignored and pointing atonnx_file_pathas the legacy escape hatchsession.vad is NoneNone.Migration
silero.VAD.load()with default settingsfrom livekit.agents import inference; inference.VAD(model="silero")silero.VAD.load(sample_rate=8000)silero.VAD.load(onnx_file_path=...)silero.VAD.load(force_cpu=False)force_cpuis ignored on the delegated native pathinference.VAD(model="silero")(and accept CPU) or keep the plugin form withonnx_file_path=...to keep the legacy path